AITopics | nonverbal cue

Collaborating Authors

nonverbal cue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

844f722dbbcb27933ff5baf58a1f00c8-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-10-2026, 09:57:36 GMT

dataset, expression, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Deception Detection in Dyadic Exchanges Using Multimodal Machine Learning: A Study on a Swedish Cohort

Samuels, Thomas Jack, Rugolon, Franco, Hau, Stephan, Högman, Lennart

arXiv.org Artificial IntelligenceDec-12-2025

This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.

artificial intelligence, machine learning, modality, (17 more...)

arXiv.org Artificial Intelligence

2506.21429

Country: Europe > Sweden (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Learning When to Ask: Simulation-Trained Humanoids for Mental-Health Diagnosis

Cenacchi, Filippo, Richards, Deborah, Cao, Longbing

arXiv.org Artificial IntelligenceDec-11-2025

Testing humanoid robots with users is slow, causes wear, and limits iteration and diversity. Yet screening agents must master conversational timing, prosody, backchannels, and what to attend to in faces and speech for Depression and PTSD. Most simulators omit policy learning with nonverbal dynamics; many controllers chase task accuracy while underweighting trust, pacing, and rapport. We virtualise the humanoid as a conversational agent to train without hardware burden. Our agent-centred, simulation-first pipeline turns interview data into 276 Unreal Engine MetaHuman patients with synchronised speech, gaze/face, and head-torso poses, plus PHQ-8 and PCL-C flows. A perception-fusion-policy loop decides what and when to speak, when to backchannel, and how to avoid interruptions, under a safety shield. Training uses counterfactual replay (bounded nonverbal perturbations) and an uncertainty-aware turn manager that probes to reduce diagnostic ambiguity. Results are simulation-only; the humanoid is the transfer target. In comparing three controllers, a custom TD3 (Twin Delayed DDPG) outperformed PPO and CEM, achieving near-ceiling coverage with steadier pace at comparable rewards. Decision-quality analyses show negligible turn overlap, aligned cut timing, fewer clarification prompts, and shorter waits. Performance stays stable under modality dropout and a renderer swap, and rankings hold on a held-out patient split. Contributions: (1) an agent-centred simulator that turns interviews into 276 interactive patients with bounded nonverbal counterfactuals; (2) a safe learning loop that treats timing and rapport as first-class control variables; (3) a comparative study (TD3 vs PPO/CEM) with clear gains in completeness and social timing; and (4) ablations and robustness analyses explaining the gains and enabling clinician-supervised humanoid pilots.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.08952

Country:

Europe > Middle East > Cyprus (0.16)
Oceania > Australia (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language

Lee, Seungbeen, Jeong, Jinhong, Kim, Donghyun, Son, Yejin, Yu, Youngjae

arXiv.org Artificial IntelligenceNov-21-2025

Our ability to interpret others' mental states through nonverbal cues (NVCs) is fundamental to our survival and social cohesion. While existing Theory of Mind (ToM) benchmarks have primarily focused on false-belief tasks and reasoning with asymmetric information, they overlook other mental states beyond belief and the rich tapestry of human nonverbal communication. We present Motion2Mind, a framework for evaluating the ToM capabilities of machines in interpreting NVCs. Leveraging an expert-curated body-language reference as a proxy knowledge base, we build Motion2Mind, a carefully curated video dataset with fine-grained nonverbal cue annotations paired with manually verified psychological interpretations. It encompasses 222 types of nonverbal cues and 397 mind states. Our evaluation reveals that current AI systems struggle significantly with NVC interpretation, exhibiting not only a substantial performance gap in Detection, as well as patterns of over-interpretation in Explanation compared to human annotators.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.15887

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance

Kim, Subin, Kim, Hoonrae, Lee, Jihyun, Jeon, Yejin, Lee, Gary Geunbae

arXiv.org Artificial IntelligenceOct-16-2025

Recent studies have explored the use of large language models (LLMs) in psychotherapy; however, text-based cognitive behavioral therapy (CBT) models often struggle with client resistance, which can weaken therapeutic alliance. To address this, we propose a multimodal approach that incorporates nonverbal cues, which allows the AI therapist to better align its responses with the client's negative emotional state. Specifically, we introduce a new synthetic dataset, Mirror (Multimodal Interactive Rolling with Resistance), which is a novel synthetic dataset that pairs each client's statements with corresponding facial images. Using this dataset, we train baseline vision language models (VLMs) so that they can analyze facial cues, infer emotions, and generate empathetic responses to effectively manage client resistance. These models are then evaluated in terms of both their counseling skills as a therapist, and the strength of therapeutic alliance in the presence of client resistance. Our results demonstrate that Mirror significantly enhances the AI therapist's ability to handle resistance, which outperforms existing text-based CBT approaches. Human expert evaluations further confirm the effectiveness of our approach in managing client resistance and fostering therapeutic alliance.

large language model, machine learning, resistance, (19 more...)

arXiv.org Artificial Intelligence

2504.13211

Country:

Asia (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets

Neural Information Processing SystemsAug-16-2025, 14:12:19 GMT

However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures.

machine learning, natural language, object-oriented architecture, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Kim, Youngmin, Chung, Jiwan, Kim, Jisoo, Lee, Sunghyun, Lee, Sangkyu, Kim, Junhyeok, Yang, Cheoljong, Yu, Youngjae

arXiv.org Artificial IntelligenceJun-3-2025

Nonverbal communication is integral to human interaction, with gestures, facial expressions, and body language conveying critical aspects of intent and emotion. However, existing large language models (LLMs) fail to effectively incorporate these nonverbal elements, limiting their capacity to create fully immersive conversational experiences. We introduce MARS, a multimodal language model designed to understand and generate nonverbal cues alongside text, bridging this gap in conversational AI. Our key innovation is VENUS, a large-scale dataset comprising annotated videos with time-aligned text, facial expressions, and body language. Leveraging VENUS, we train MARS with a next-token prediction objective, combining text with vector-quantized nonverbal representations to achieve multimodal understanding and generation within a unified framework. Based on various analyses of the VENUS datasets, we validate its substantial scale and high effectiveness. Our quantitative and qualitative results demonstrate that MARS successfully generates text and nonverbal languages, corresponding to conversational input.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.00958

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SocialMind: LLM-based Proactive AR Social Assistive System with Human-like Perception for In-situ Live Interactions

Yang, Bufang, Guo, Yunqi, Xu, Lilin, Yan, Zhenyu, Chen, Hongkai, Xing, Guoliang, Jiang, Xiaofan

arXiv.org Artificial IntelligenceDec-5-2024

Social interactions are fundamental to human life. The recent emergence of large language models (LLMs)-based virtual assistants has demonstrated their potential to revolutionize human interactions and lifestyles. However, existing assistive systems mainly provide reactive services to individual users, rather than offering in-situ assistance during live social interactions with conversational partners. In this study, we introduce SocialMind, the first LLM-based proactive AR social assistive system that provides users with in-situ social assistance. SocialMind employs human-like perception leveraging multi-modal sensors to extract both verbal and nonverbal cues, social factors, and implicit personas, incorporating these social cues into LLM reasoning for social suggestion generation. Additionally, SocialMind employs a multi-tier collaborative generation strategy and proactive update mechanism to display social suggestions on Augmented Reality (AR) glasses, ensuring that suggestions are timely provided to users without disrupting the natural flow of conversation. Evaluations on three public datasets and a user study with 20 participants show that SocialMind achieves 38.3% higher engagement compared to baselines, and 95% of participants are willing to use SocialMind in their live social interactions.

interaction, social suggestion, socialmind, (16 more...)

arXiv.org Artificial Intelligence

2412.04036

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Florida > Hillsborough County > University (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine > Consumer Health (0.92)
Media (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

EXCLUSIVE: I'm a psychologist - here are 9 subtle signs your partner lacks emotional intelligence

Daily Mail - Science & techNov-27-2023, 17:01:46 GMT

Emotional intelligence, or emotional quotient (EQ), has become a buzz phrase in recent years as part of a wider trend of'therapy talk' being used in day-to-day conversations. But relationship experts say a mismatch in EQ is often the source of breakdowns in marriages and partnerships. One of the first problems in most relationships on the rocks is a lack of communication, according to psychologist Dr. Scott Lyons, and a key part of communication is emotional intelligence. 'Emotional intelligence is the ability to recognize, understand, and manage one's own emotions,' he told DailyMail.com. He said that while you might be able to have fun with a romantic partner who has low EQ, the connection may not go much deeper.

emotional intelligence, intelligence, lyon, (13 more...)

Daily Mail - Science & tech

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey

Beyan, Cigdem, Vinciarelli, Alessandro, Del Bue, Alessio

arXiv.org Artificial IntelligenceOct-4-2023

Automated co-located human-human interaction analysis has been addressed by the use of nonverbal communication as measurable evidence of social and psychological phenomena. We survey the computing studies (since 2010) detecting phenomena related to social traits (e.g., leadership, dominance, personality traits), social roles/relations, and interaction dynamics (e.g., group cohesion, engagement, rapport). Our target is to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings (free-standing conversations, meetings, dyads, and crowds). We also present a comprehensive summary of the related datasets and outline future research directions which are regarding the implementation of artificial intelligence, dataset curation, and privacy-preserving interaction analysis. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively; multimodal features are prominently performing better; deep learning architectures showed improved performance in overall, but there exist many phenomena whose detection has never been implemented through deep models. We also identified several limitations such as the lack of scalable benchmarks, annotation reliability tests, cross-dataset experiments, and explainability analysis.

co-located human-human interaction analysis, nonverbal cue

arXiv.org Artificial Intelligence

doi: 10.1145/3626516

2207.10574

Genre: Overview (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)

Add feedback